IntLim vignette

Mingrui Liu, Ewy Mathe, Jalal Siddiqui

2017-08-01

Introduction

Data Input Format

All files supplied in the CSV file, and the CSV file itself should be placed in the same folder.
The software assumes will automatically retrieve the file path of the input files (based on location of CSV files). Note also that the input data files should be in a specific format: metabData: rows are metabolites, columns are samples geneData: rows are genes, columns are samples metabMetaData: rows are metabolites, features are columns geneMetaData: rows are genes, features are columns sampleMetaData: rows are samples, features are columns

In addition, the first column of the sampleMetaData file is assumed to be the sample id, and those sample ids should match the columns of metabData and geneData (e.g. it is required that all sample ids in the metabData and geneData are also in the sampleMetaDatafile).

Running the user-friendly shiny web application

To start up the app, simply type:

    runIntLimApp()

Example Workflow

A small data set is embedded in a package. To access it:

     dir <- system.file("extdata", package="IntLim", mustWork=TRUE)
     csvfile <- file.path(dir, "NCItestinput.csv")
     csvfile
## [1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/IntLim/extdata/NCItestinput.csv"

Read in the data and organize into a MultiDataset object, and print out some statistics:

inputData <- IntLim::ReadData(csvfile,metabid='id',geneid='id')
## [1] "CreateMultiDataSet created"
IntLim::ShowStats(inputData)
##   Num_Genes Num_Metabolites Num_Samples_withGeneExpression
## 1      1448             257                             20
##   Num_Samples_withMetabolomics Num_Samples_inCommon
## 1                           20                   20

Optionally, the features (genes or metabolites) can be filtered out based on their mean values. Users should input a percentile cutoff and any feature with a mean value below that cutoff will be removed.

inputDatafilt <- IntLim::FilterData(inputData,geneperc = 0.15)
## [1] "No metabolite filtering by percentile is applied"
## [1] "No metabolite filtering by missing values is applied"
IntLim::ShowStats(inputDatafilt)
##   Num_Genes Num_Metabolites Num_Samples_withGeneExpression
## 1      1230             257                             20
##   Num_Samples_withMetabolomics Num_Samples_inCommon
## 1                           20                   20

Verify the distribution of the input data:

IntLim::PlotDistributions(inputData)

Evaluate the clustering of the data via a Principal Components Analysis (unsupervised)

IntLim::PlotPCA(inputData,stype = "PBO_vs_Leukemia")
## Warning in RColorBrewer::brewer.pal(numcateg, palette): minimal value for n is 3, returning requested palette with 3 different levels

Run the linear models and plot distribution of p-values:

myres <- IntLim::RunIntLim(inputData,stype="PBO_vs_Leukemia")
## [1] "Running the analysis on"
## 
## Leukemia      PBO 
##        6       14 
## [1] "10 % complete"
## [1] "20 % complete"
## [1] "30 % complete"
## [1] "40 % complete"
## [1] "50 % complete"
## [1] "60 % complete"
## [1] "70 % complete"
## [1] "80 % complete"
## [1] "90 % complete"
##    user  system elapsed 
##  14.092   0.169  14.314
IntLim::DistPvalues(myres)

Process the results and filter pairs of genes-metabolites based on adjusted p-values and differences in correlation coefficients between groups 1 and 2. Then plot heatmap of significant gene-metabolite pairs

myres <- IntLim::ProcessResults(myres,inputData)
IntLim::CorrHeatmap(myres)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Plot a pair of interest:

IntLim::PlotGMPair(inputData,stype="PBO_vs_Leukemia","DLG4","(p-Hydroxyphenyl)lactic acid")

Lastly, various writing functions are implemented. One can write the output after data filtering using the following function (the file will be written to the home directory):

IntLim::OutputData(inputData=inputDatafilt,filename="~/FilteredData.zip")

Users can also write the results file:

OutputResults(inputResults=myres2,filename="~/MyResults.zip")
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X Yosemite 10.10.5
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] IntLim_0.1.0    devtools_1.13.2
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.0            plyr_1.8.4                
##   [3] igraph_1.0.1               lazyeval_0.2.0            
##   [5] splines_3.3.2              BiocParallel_1.6.6        
##   [7] crosstalk_1.0.0            GenomeInfoDb_1.8.7        
##   [9] ggplot2_2.2.1              digest_0.6.12             
##  [11] foreach_1.4.3              highcharter_0.5.0         
##  [13] htmltools_0.3.6            viridis_0.4.0             
##  [15] gdata_2.18.0               magrittr_1.5              
##  [17] memoise_1.1.0              cluster_2.0.6             
##  [19] gclus_1.3.1                limma_3.28.21             
##  [21] Biostrings_2.40.2          annotate_1.50.1           
##  [23] matrixStats_0.52.2         xts_0.9-7                 
##  [25] siggenes_1.46.0            colorspace_1.3-2          
##  [27] dplyr_0.7.2                RCurl_1.95-4.8            
##  [29] jsonlite_1.5               genefilter_1.54.2         
##  [31] bindr_0.1                  GEOquery_2.38.4           
##  [33] survival_2.41-3            zoo_1.8-0                 
##  [35] iterators_1.0.8            glue_1.1.1                
##  [37] registry_0.3               gtable_0.2.0              
##  [39] zlibbioc_1.18.0            XVector_0.12.1            
##  [41] kernlab_0.9-25             prabclus_2.2-6            
##  [43] BiocGenerics_0.18.0        quantmod_0.4-9            
##  [45] DEoptimR_1.0-8             scales_0.4.1              
##  [47] mvtnorm_1.0-6              DBI_0.6-1                 
##  [49] rngtools_1.2.4             Rcpp_0.12.12              
##  [51] MultiDataSet_1.0.2         viridisLite_0.2.0         
##  [53] xtable_1.8-2               bumphunter_1.12.0         
##  [55] foreign_0.8-68             mclust_5.3                
##  [57] preprocessCore_1.34.0      stats4_3.3.2              
##  [59] htmlwidgets_0.9            httr_1.2.1                
##  [61] gplots_3.0.1               RColorBrewer_1.1-2        
##  [63] fpc_2.1-10                 modeltools_0.2-21         
##  [65] pkgconfig_2.0.1            reshape_0.8.6             
##  [67] XML_3.98-1.8               flexmix_2.3-14            
##  [69] nnet_7.3-12                locfit_1.5-9.1            
##  [71] labeling_0.3               rlang_0.1.1               
##  [73] reshape2_1.4.2             AnnotationDbi_1.34.4      
##  [75] munsell_0.4.3              tools_3.3.2               
##  [77] RSQLite_1.1-2              broom_0.4.2               
##  [79] evaluate_0.10              stringr_1.2.0             
##  [81] yaml_2.1.14                heatmaply_0.10.1          
##  [83] knitr_1.16                 beanplot_1.2              
##  [85] robustbase_0.92-7          caTools_1.17.1            
##  [87] purrr_0.2.2.2              dendextend_1.5.2          
##  [89] bindrcpp_0.2               nlme_3.1-131              
##  [91] doRNG_1.6.6                mime_0.5                  
##  [93] whisker_0.3-2              nor1mix_1.2-2             
##  [95] biomaRt_2.28.0             plotly_4.7.1              
##  [97] tibble_1.3.3               stringi_1.1.5             
##  [99] GenomicFeatures_1.24.5     minfi_1.18.6              
## [101] lattice_0.20-35            trimcluster_0.1-2         
## [103] Matrix_1.2-10              psych_1.7.5               
## [105] multtest_2.28.0            data.table_1.10.4         
## [107] bitops_1.0-6               seriation_1.2-2           
## [109] httpuv_1.3.3               rtracklayer_1.32.2        
## [111] GenomicRanges_1.24.3       R6_2.2.2                  
## [113] TSP_1.1-5                  KernSmooth_2.23-15        
## [115] gridExtra_2.2.1            IRanges_2.6.1             
## [117] codetools_0.2-15           MASS_7.3-47               
## [119] gtools_3.5.0               assertthat_0.2.0          
## [121] SummarizedExperiment_1.2.3 rprojroot_1.2             
## [123] openssl_0.9.6              pkgmaker_0.22             
## [125] withr_1.0.2                GenomicAlignments_1.8.4   
## [127] Rsamtools_1.24.0           mnormt_1.5-5              
## [129] S4Vectors_0.10.3           rlist_0.4.6.1             
## [131] diptest_0.75-7             parallel_3.3.2            
## [133] quadprog_1.5-5             grid_3.3.2                
## [135] tidyr_0.6.3                base64_2.0                
## [137] class_7.3-14               rmarkdown_1.6             
## [139] illuminaio_0.14.0          git2r_0.18.0              
## [141] TTR_0.23-1                 Biobase_2.32.0            
## [143] shiny_1.0.3                lubridate_1.6.0